53 research outputs found

    Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain

    Full text link
    Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Recently, deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. For this reason, a preliminary decoding process is required, since video data are often stored in a compressed format. However, a high computational load and memory usage is demanded for decoding a video. To overcome this problem, we propose a deep neural network capable of learning straight from compressed video. Our approach was evaluated on two public benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating comparable recognition performance to the state-of-the-art methods, with the advantage of running up to 2 times faster in terms of inference speed

    Edited nearest neighbour for selecting keyframe summaries of egocentric videos

    Get PDF
    A keyframe summary of a video must be concise, comprehensive and diverse. Current video summarisation methods may not be able to enforce diversity of the summary if the events have highly similar visual content, as is the case of egocentric videos. We cast the problem of selecting a keyframe summary as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a dataset in some feature space, we propose a Greedy Tabu Selector algorithm (GTS) which picks one frame to represent each class. An experiment with the UT (Egocentric) video database and seven feature representations illustrates the proposed keyframe summarisation method. GTS leads to improved match to the user ground truth compared to the closest-to-centroid baseline summarisation method. Best results were obtained with feature spaces obtained from a convolutional neural network (CNN).Leverhulme Trust, UKSao Paulo Research Foundation - FAPESPBangor Univ, Sch Comp Sci, Dean St, Bangor LL57 1UT, Gwynedd, WalesFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilLeverhulme: RPG-2015-188FAPESP: 2016/06441-7Web of Scienc

    How Far Can We Get with Neural Networks Straight from JPEG?

    Full text link
    Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of leaning directly from the compressed domain have been gaining attention in recent years. These methods adapt typical CNNs to work on the compressed domain, but the common architectural modifications lead to an increase in computational complexity and the number of parameters. In this paper, we investigate the usage of CNNs that are designed to work directly with the DCT coefficients available in JPEG compressed images, proposing a handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep their computational cost similar to their RGB baselines. We make initial ablation studies on a subset of ImageNet in order to analyse the impact of different frequency ranges, image resolution, JPEG quality and classification task difficulty on the performance of the models. Then, we evaluate the models on the complete ImageNet dataset. Our results indicate that DCT models are capable of obtaining good performance, and that it is possible to reduce the computational complexity and the number of parameters from these models while retaining a similar classification accuracy through the use of our proposed techniques.Comment: arXiv admin note: substantial text overlap with arXiv:2012.1372

    Tightening Classification Boundaries in Open Set Domain Adaptation through Unknown Exploitation

    Full text link
    Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas due to their capacity of learning from raw data. However, when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance, such as unlabeled datasets with different levels of domain shift and category shift. Particularly, when both issues occur at the same time, we tackle this challenging setup as Open Set Domain Adaptation (OSDA) problem. In general, existing OSDA approaches focus their efforts only on aligning known classes or, if they already extract possible negative instances, use them as a new category learned with supervision during the course of training. We propose a novel way to improve OSDA approaches by extracting a high-confidence set of unknown instances and using it as a hard constraint to tighten the classification boundaries of OSDA methods. Especially, we adopt a new loss constraint evaluated in three different means, (1) directly with the pristine negative instances; (2) with randomly transformed negatives using data augmentation techniques; and (3) with synthetically generated negatives containing adversarial features. We assessed all approaches in an extensive set of experiments based on OVANet, where we could observe consistent improvements for two public benchmarks, the Office-31 and Office-Home datasets, yielding absolute gains of up to 1.3% for both Accuracy and H-Score on Office-31 and 5.8% for Accuracy and 4.7% for H-Score on Office-Home

    Budget-Aware Pruning: Handling Multiple Domains with Less Parameters

    Full text link
    Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single domain or task, requiring them to learn and store new parameters for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by learning a single model that is capable of performing well in multiple domains. Nevertheless, the models are usually larger than the baseline for a single domain. This work tackles both of these problems: our objective is to prune models capable of handling multiple domains according to a user-defined budget, making them more computationally affordable while keeping a similar classification performance. We achieve this by encouraging all domains to use a similar subset of filters from the baseline model, up to the amount defined by the user's budget. Then, filters that are not used by any domain are pruned from the network. The proposed approach innovates by better adapting to resource-limited devices while, to our knowledge, being the only work that handles multiple domains at test time with fewer parameters and lower computational complexity than the baseline model for a single domain.Comment: arXiv admin note: substantial text overlap with arXiv:2210.0810

    Budget-Aware Pruning for Multi-Domain Learning

    Full text link
    Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single domain or task, requiring them to learn and store new parameters for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by learning a single model that is capable of performing well in multiple domains. Nevertheless, the models are usually larger than the baseline for a single domain. This work tackles both of these problems: our objective is to prune models capable of handling multiple domains according to a user defined budget, making them more computationally affordable while keeping a similar classification performance. We achieve this by encouraging all domains to use a similar subset of filters from the baseline model, up to the amount defined by the user's budget. Then, filters that are not used by any domain are pruned from the network. The proposed approach innovates by better adapting to resource-limited devices while, to our knowledge, being the only work that is capable of handling multiple domains at test time with fewer parameters and lower computational complexity than the baseline model for a single domain

    Climate change, in the framework of the constructal law

    Get PDF
    Here we present a simple and transparent alternative to the complex models of Earth thermal behavior under time-changing conditions. We show the one-to-one relationship between changes in atmospheric properties and time-dependent changes in temperature and its distribution on Earth. The model accounts for convection and radiation, thermal inertia and changes in albedo (ρ) and greenhouse factor (γ). The constructal law is used as the principle that governs the evolution of flow configuration in time, and provides closure for the equations that describe the model. In the first part of the paper, the predictions are tested against the current thermal state of Earth. Next, the model showed that for two time-dependent scenarios, (δρ = 0.002; δγ = 0.011) and (δρ = 0.002; δγ = 0.005) the predicted equatorial and polar temperature increases and the time scales are (Δ<i>T</i><sub>H</sub> = 1.16 K; Δ<i>T</i><sub>L</sub> = 1.11 K; 104 years) and (0.41 K; 0.41 K; 57 years), respectively. In the second part, a continuous model of temperature variation was used to predict the thermal response of the Earth's surface for changes bounded by δρ = δγ and δρ = −δγ. The results show that the global warming amplitudes and time scales are consistent with those obtained for δρ = 0.002 and δγ = 0.005. The poleward heat current reaches its maximum in the vicinity of 35° latitude, accounting for the position of the Ferrel cell between the Hadley and Polar Cells

    Multimedia geocoding: the RECOD 2014 approach

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much evidences as possible (textual, visual, and/or audio descriptors) to geocode a given image/video. We estimate the location of test items by clustering the geographic coordinates of top-ranked items in one or more ranked lists defined in terms of different criteria.This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much e1263FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2013/08645-0 ; 2013/11359-0306580/2012-8 ; 484254/2012-0sem informaçãoMediaEval 2014 Worksho
    corecore